library(mosaic)
library(plotly)
library(DT)
library(pander)
library(car)
library(tidyverse)

 

## Wide data from original article:
SAT_Scores <- read.csv("../../Data/SAT_Scores.csv", header=TRUE) 

Background

Ednium is a non-profit organization in Denver, Colorado that is aimed at helping improve our K-12 education system in Denver in order to have students from Denver be prepared and ready for college. I was able to be a part of this organization and help out with projects and plans to pass which will improve the public education system in the Denver County 1 school district. The project I have worked on involves gathering SAT score averages from schools in four different school districts. These four different school districts are: Denver County 1, Cherry Creek 5, Jefferson County R-1, and Colorado Springs 11. I purposefully compare the Denver County 1 school districts with these other school districts because they have large schools and many people consider the Cherry Creek 5 and Jefferson County R-1 school districts to be some of the best. I purposefully picked these school districts because been in the Denver County 1 school district from K-12, I know that this school district lacks in preparing students for college and needs a lot of support and help. I decided to conduct a one-way ANOVa to see if there is a statistically significant difference between the Denver County 1 school district and the other districts.

Introduction

I have always been passionate about improving the education in Denver, Colorado which is why I joined the Ednium team. The research I have conducted is meant to help those in Ednium be able to have solid statistical evidence that the Denver County 1 school district needs change. I believe everyone should be aware of the education disparity which exists. Before I continue on with my study, I know that tests such as the ACT or SAT do not measure how smart a student is. BUT, what it does measure is some of the NEEDS of students, which is why I decided to do my test on average SAT scores in the schools of the four previously mentioned school districts.

The Data

Below is the data which I used from the Colorado Department of Education. The Colorado Department of Education publishes public data to help us know how each school district and school is performing. The data I used shows the 2019 average SAT scores for each high school in each of the four school districts. Therefore, each data point represents a high school and that high schools’ 2019 average SAT score. Instead of putting the names of the high schools, I put the name of the school district the high school belongs to because this ANOVA is comparing school districts.

datatable(SAT_Scores, options=list(lengthMenu = c(15,15,15)), extensions="Responsive")

Question

The question I am trying to answer is: Is there a significant difference in 2019 average SAT scores for the Denver County 1 school district when compared to the other three school districts?

Hypothesis

My null hypothesis states that the 2019 average SAT scores for the school districts are the same. My alternative hypothesis states that the average SAT score for at least one school district is different. My level of significance is 0.05.

\[ H_0: \mu_\text{DenverCounty1} = \mu_\text{CherryCreek5} = \mu_\text{ JeffersonCountyR1} = \mu_\text{ ColoradoSprings11} = \mu \]

\[ H_a: \text{at least one mean differs} \]

\(\alpha = 0.05\)

Diagnostic Plots

The diagnostic plots are here to check to make sure the assumptions for ANOVA are met, which they are. Based on my residuals vs fitted plot, we can say that there is a constant variance. If we look at the QQplot, most of the data is within the dashed boundary lines. Some of the points are on one of the dashed boundary lines and a few are outside of the dashed boundary lines. However, this is very little. It is still safe to say the data is normally distributed.

Omit1 <- filter(SAT_Scores, !is.na(School_District_Names))

Omit1 <- filter(Omit1, !is.na(SAT.Scores))

SAT.aov <- aov(SAT.Scores ~ School_District_Names, data=Omit1)

par(mfrow=c(1,2))
plot(SAT.aov, which=1, pch=16)
qqPlot(SAT.aov$residuals, id=FALSE)
mtext(side=3,text="Q-Q Plot of Residuals")

Data Analysis

As you can see below, I conducted the ANOVA test which shows there is at least one school district that differs in its 2019 mean SAT score. The P-value is significant, it is 4.723e-14.

summary(SAT.aov) %>%
  pander()
Analysis of Variance Model
  Df Sum Sq Mean Sq F value Pr(>F)
School_District_Names 3 834540 278180 23.3 4.723e-14
Residuals 458 5468188 11939 NA NA

Graphical Summary

For the graphical summary, I decided to do a boxplot that shows the min, Q1, median, Q3, and max for average SAT scores for each school district. As you can see, the Denver County 1 school district has the lowest median average SAT score at 864. For Colorado Springs it is 880, which is really close to the lowest median average SAT score of the Denver County 1 school district. The median average SAT score for the Cherry Creek 5 school district is 980, and for Jefferson County R-1 it is 971. Cherry Creek 5 and Jefferson County R-1 performed a lot better than the Denver County 1 school district and the Colorado Springs 11 school district.

plot_ly(Omit1, y=~SAT.Scores, x=~as.factor(School_District_Names), type='box', fillcolor='darkblue', line=list(color='darkgray', width=3), marker = list(color='orange', line = list(color='red', width=1))) %>%
layout(title='2019 Average SAT Scores for \n Four School Districts in Colorado', xaxis=list(title='School District'), yaxis=list(title='Average SAT Score'))
#xyplot(SAT.Scores ~ as.factor(School_District_Names), data=Omit1, xlab='School District', type=c("p","a"), ylab= 'Average SAT Scores', main='Average SAT Scores Per School District', col='blue')

Numerical Summary

Sample Size

Below is my sample size.

summary(as.factor(SAT_Scores$School_District_Names)) %>% 
  pander()
CherryCreek5 ColoradoSprings11 DenverCounty1 JeffersonCountyR1
37 48 241 136

For my first numerical summary, I did a TukeyHSD of the ANOVA test to see the differences of 2019 average SAT scores between school districts. If you take a look closely, when you compare the Denver County 1 Average SAT scores to Cherry Creek 5 and to Jefferson County R-1, there is a significant difference because the p-values are below .05 (my alpha).

TukeyHSD(SAT.aov) %>% 
  pander()
  • School_District_Names:

      diff lwr upr p adj
    ColoradoSprings11-CherryCreek5 -95.13 -156.8 -33.49 0.0004645
    DenverCounty1-CherryCreek5 -106.1 -155.8 -56.35 3.794e-07
    JeffersonCountyR1-CherryCreek5 -22.23 -74.48 30.01 0.6913
    DenverCounty1-ColoradoSprings11 -10.97 -55.5 33.56 0.9206
    JeffersonCountyR1-ColoradoSprings11 72.9 25.6 120.2 0.0004757
    JeffersonCountyR1-DenverCounty1 83.87 53.65 114.1 1.753e-11

You can visually see what the TukeyHSD data shows of the difference in 2019 average SAT scores between school districts by looking at this “95% family-wise confidence level” plot. This is showing that we are 95% confident of our results for the difference in average SAT scores between school districts. The closer the school districts are to zero in the x-axis, the less the difference in average SAT scores between them. Jefferson County R-1 and Cherry Creek 5 perform significantly better than the Denver County 1 school district and the Colorado Springs 11 school district.

par(mar=c(4,15,4,1))
plot(TukeyHSD(SAT.aov), las=1, cex.axis=.9)

Below my plot, you will see the summary I created of the data I gathered from the Colorado Department of Education. If you look at the average SAT score for the Denver County 1 school district (884.7), this is way lower than the average SAT score for the Cherry Creek 5 school district (990.8) and the Jefferson County R-1 school district (968.5).

SAT_Scores %>% 
   group_by(School_District_Names) %>% 
   summarise(min = min(SAT.Scores), Q1 = quantile(SAT.Scores, 0.25), mean = mean(SAT.Scores), Q3 = quantile(SAT.Scores, 0.75), max = max(SAT.Scores)) %>% 
  pander()
School_District_Names min Q1 mean Q3 max
CherryCreek5 795 916 990.8 1057 1191
ColoradoSprings11 716 836.5 895.6 963.2 1076
DenverCounty1 660 807 884.7 959 1234
JeffersonCountyR1 730 894.8 968.5 1041 1285

Interpretation/Conclusion

From our results, we can confidently say that the Denver County 1 school district and the Colorado Springs 11 school district need much support in order to catch up to the Cherry Creek 5 school district and the Jefferson County R-1 school district. As educators and organizations strive to help the Denver County 1 school district, we ourselves should be “all in” and support as much as we can. There has been great progress in getting rid of the gap. However, there is still much more to do. The time for us to act is now.